X Variables

Week 11

Evie Zhang

Old Dominion University

Topics

  • Distributed Lag Models
  • Spurious Regression
  • Cointegration

GDP and the Yield Curve

GDP and the Yield Curve

Code
tmp <- read.csv("../data/fredgraph (2).csv")
colnames(tmp) <- c("DATE", "gdp_pc", "ir3m", "ir10y", "gdp")
rbind(head(tmp, 5), tail(tmp, 5))
          DATE  gdp_pc ir3m ir10y       gdp
1   1962-01-01 2.30808 2.72  3.86   594.013
2   1962-04-01 1.06951 2.73  4.00   600.366
3   1962-07-01 1.44262 2.78  3.94   609.027
4   1962-10-01 0.53413 2.87  3.85   612.280
5   1963-01-01 1.53394 2.89  3.95   621.672
208 2013-10-01 1.31302 0.07  3.04 17133.114
209 2014-01-01 0.06518 0.05  2.73 17144.281
210 2014-04-01 1.85731 0.04  2.53 17462.703
211 2014-07-01 1.60642 0.02  2.52 17743.227
212 2014-10-01 0.61608 0.03  2.17 17852.540

GDP and the Yield Curve

Once you have your data, you might need to add/modify variables.

  • Do this before converting to a ts() object.
Code
tmp$spread <- tmp$ir10y - tmp$ir3m

GDP and the Yield Curve

You can also do this after:

Code
tmp <- ts(tmp, start = c(1962, 1), frequency = 4)
# ts.union vs ts.intersect
tmp <- ts.union(tmp, spread = tmp[,"ir10y"] - tmp[,"ir3m"], dframe = FALSE)
rbind(head(tmp, 3), tail(tmp, 3))

colnames(tmp) <- gsub("tmp.", "", colnames(tmp))
     tmp.DATE tmp.gdp_pc tmp.ir3m tmp.ir10y   tmp.gdp tmp.spread spread
[1,]        1    2.30808     2.72      3.86   594.013       1.14   1.14
[2,]        2    1.06951     2.73      4.00   600.366       1.27   1.27
[3,]        3    1.44262     2.78      3.94   609.027       1.16   1.16
[4,]      210    1.85731     0.04      2.53 17462.703       2.49   2.49
[5,]      211    1.60642     0.02      2.52 17743.227       2.50   2.50
[6,]      212    0.61608     0.03      2.17 17852.540       2.14   2.14

Distributed Lags

What if you need to manipulate the data?

Code
head(ts.union(y = tmp[,"gdp_pc"],
              diff_y = diff(tmp[,"gdp"]),
              pc_y = 100*diff(tmp[,"gdp"])/tmp[,"gdp"]), 10)
              y diff_y      pc_y
1962 Q1 2.30808     NA        NA
1962 Q2 1.06951  6.353 1.0581878
1962 Q3 1.44262  8.661 1.4221044
1962 Q4 0.53413  3.253 0.5312929
1963 Q1 1.53394  9.392 1.5107645
1963 Q2 1.29972  8.080 1.2830448
1963 Q3 2.33298 14.692 2.2797947
1963 Q4 1.47321  9.494 1.4518196
1964 Q1 2.42898 15.884 2.3713763
1964 Q2 1.32155  8.852 1.3043081

GDP and the Yield Curve

Code
plot(tmp[,c("gdp_pc")], ylim = range(tmp[,c("gdp_pc", "spread")]),
     col = alpha("tomato", .6), lwd = 2)
lines(tmp[,"spread"], col = alpha("dodgerblue", .6), lwd = 2)
abline(h = 0)
legend("topright", lty = 1, col = c("tomato", "dodgerblue"),
       legend = c("%C GDP", "Spread"), bty = "n")

GDP and the Yield Curve

Check for stationarity:

Code
adf.test(tmp[,"gdp_pc"])
adf.test(tmp[,"spread"])

    Augmented Dickey-Fuller Test

data:  tmp[, "gdp_pc"]
Dickey-Fuller = -5.0107, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary


    Augmented Dickey-Fuller Test

data:  tmp[, "spread"]
Dickey-Fuller = -4.9057, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary

Code
acf(tmp[,c("gdp_pc", "spread")])

Code
pacf(tmp[,c("gdp_pc", "spread")])

Distributed Lags

There are two (main) options to estimating DL models:

dynlm, dynlm()

  • Data can be a ts() object.
  • Lags are (IMO) more intuitive.
  • No method for prediction

dLagM, ardlDlm()

  • Data must be a data.frame object.
  • Lags are less intuitive.
  • Cannot be (easily) used with stargazer
Code
library("pacman")
p_load(dynlm, dLagM)

dynlm

Distributed Lags

Main model: dynlm()

  • To use lags, use L()
    • L(var, i:j)
    • i:j is a vector for the lag structure. This can be any vector.

Distributed Lags

Code
r1 <- dynlm(tmp[,"gdp_pc"] ~ tmp[,"spread"])
r2 <- lm(tmp[,"gdp_pc"] ~ tmp[,"spread"])
stargazer::stargazer(r1, r2, type = "html")
Dependent variable:
tmp[, “gdp_pc”]
dynamic OLS
linear
(1) (2)
tmp[, “spread”] -0.166*** -0.166***
(0.051) (0.051)
Constant 1.888*** 1.888***
(0.101) (0.101)
Observations 212 212
R2 0.048 0.048
Adjusted R2 0.044 0.044
Residual Std. Error (df = 210) 0.938 0.938
F Statistic (df = 1; 210) 10.659*** 10.659***
Note: p<0.1; p<0.05; p<0.01

Distributed Lags

Code
r1 <- dynlm(tmp[,"gdp_pc"] ~ tmp[,"spread"] + L(tmp[,"spread"]))
r2 <- dynlm(tmp[,"gdp_pc"] ~ L(tmp[,"spread"], 0:1))
r3 <- dynlm(gdp_pc ~ L(spread, 0:1), tmp)
stargazer::stargazer(r1, r2, r3, type = "html")
Dependent variable:
tmp[, “gdp_pc”] gdp_pc
(1) (2) (3)
tmp[, “spread”] -0.229***
(0.083)
L(tmp[, “spread”]) 0.081
(0.083)
L(tmp[, “spread”], 0:1)0 -0.229***
(0.083)
L(tmp[, “spread”], 0:1)1 0.081
(0.083)
L(spread, 0:1)0 -0.229***
(0.083)
L(spread, 0:1)1 0.081
(0.083)
Constant 1.858*** 1.858*** 1.858***
(0.105) (0.105) (0.105)
Observations 211 211 211
R2 0.052 0.052 0.052
Adjusted R2 0.043 0.043 0.043
Residual Std. Error (df = 208) 0.939 0.939 0.939
F Statistic (df = 2; 208) 5.743*** 5.743*** 5.743***
Note: p<0.1; p<0.05; p<0.01

Distributed Lags

Code
r1 <- dynlm(gdp_pc ~ L(spread, 0), tmp)
r2 <- dynlm(gdp_pc ~ L(spread, 0:4), tmp)
r3 <- dynlm(gdp_pc ~ L(spread, 0:8), tmp)
stargazer::stargazer(r1, r2, r3,
                     add.lines = list(c("AIC",
                                        round(AIC(r1), 2),
                                        round(AIC(r2), 2),
                                        round(AIC(r3), 2))),
                     omit.stat = c("ser", "f"),
                     type = "html")
Dependent variable:
gdp_pc
(1) (2) (3)
L(spread, 0) -0.166***
(0.051)
L(spread, 0:4)0 -0.236***
(0.085)
L(spread, 0:4)1 -0.076
(0.107)
L(spread, 0:4)2 0.256**
(0.105)
L(spread, 0:4)3 -0.021
(0.107)
L(spread, 0:4)4 -0.051
(0.085)
L(spread, 0:8)0 -0.249***
(0.088)
L(spread, 0:8)1 -0.051
(0.113)
L(spread, 0:8)2 0.208*
(0.113)
L(spread, 0:8)3 0.035
(0.116)
L(spread, 0:8)4 -0.122
(0.117)
L(spread, 0:8)5 0.139
(0.117)
L(spread, 0:8)6 -0.129
(0.113)
L(spread, 0:8)7 0.075
(0.113)
L(spread, 0:8)8 -0.052
(0.088)
Constant 1.888*** 1.837*** 1.867***
(0.101) (0.113) (0.129)
AIC 578.35 568.22 566.86
Observations 212 208 204
R2 0.048 0.089 0.098
Adjusted R2 0.044 0.067 0.056
Note: p<0.1; p<0.05; p<0.01

Distributed Lags

Code
r1 <- dynlm(gdp_pc ~ L(spread, 0),
            ts(tmp[9:212,], end = c(2014, 4), frequency = 4))
r2 <- dynlm(gdp_pc ~ L(spread, 0:4),
            ts(tmp[5:212,], end = c(2014, 4), frequency = 4))
r3 <- dynlm(gdp_pc ~ L(spread, 0:8), tmp)
stargazer::stargazer(r1, r2, r3,
                     add.lines = list(c("AIC",
                                        round(AIC(r1), 2),
                                        round(AIC(r2), 2),
                                        round(AIC(r3), 2))),
                     omit.stat = c("ser", "f"),
                     type = "html")
Dependent variable:
gdp_pc
(1) (2) (3)
L(spread, 0) -0.168***
(0.052)
L(spread, 0:4)0 -0.236***
(0.086)
L(spread, 0:4)1 -0.078
(0.108)
L(spread, 0:4)2 0.256**
(0.106)
L(spread, 0:4)3 -0.020
(0.108)
L(spread, 0:4)4 -0.051
(0.086)
L(spread, 0:8)0 -0.249***
(0.088)
L(spread, 0:8)1 -0.051
(0.113)
L(spread, 0:8)2 0.208*
(0.113)
L(spread, 0:8)3 0.035
(0.116)
L(spread, 0:8)4 -0.122
(0.117)
L(spread, 0:8)5 0.139
(0.117)
L(spread, 0:8)6 -0.129
(0.113)
L(spread, 0:8)7 0.075
(0.113)
L(spread, 0:8)8 -0.052
(0.088)
Constant 1.901*** 1.840*** 1.867***
(0.104) (0.115) (0.129)
AIC 561.41 560.82 566.86
Observations 204 204 204
R2 0.050 0.089 0.098
Adjusted R2 0.045 0.066 0.056
Note: p<0.1; p<0.05; p<0.01

Distributed Lags

Code
r1 <- dynlm(gdp_pc ~ L(gdp_pc, 1:2), tmp)
r2 <- dynlm(gdp_pc ~ L(spread, 0:2), tmp)
r3 <- dynlm(gdp_pc ~ L(gdp_pc, 1:2) + L(spread, 0:2), tmp)
stargazer::stargazer(r1, r2, r3,
                     add.lines = list(c("AIC",
                                        round(AIC(r1), 2),
                                        round(AIC(r2), 2),
                                        round(AIC(r3), 2))),
                     omit.stat = c("ser", "f"),
                     type = "html")
Dependent variable:
gdp_pc
(1) (2) (3)
L(gdp_pc, 1:2)1 0.323*** 0.300***
(0.067) (0.067)
L(gdp_pc, 1:2)2 0.275*** 0.272***
(0.067) (0.068)
L(spread, 0:2)0 -0.233*** -0.121
(0.082) (0.074)
L(spread, 0:2)1 -0.086 -0.054
(0.104) (0.092)
L(spread, 0:2)2 0.215*** 0.192***
(0.082) (0.073)
Constant 0.654*** 1.796*** 0.673***
(0.129) (0.107) (0.174)
AIC 523.84 570.27 520.33
Observations 210 210 210
R2 0.258 0.083 0.291
Adjusted R2 0.251 0.070 0.273
Note: p<0.1; p<0.05; p<0.01

dLagM

Distributed Lags

Main model: ardlDlm()

  • To use lags, use p and q.
    • p denotes lags of \(X\)
    • q denotes lags of \(Y\)
    • use remove to omit lags you are uninterested in.

Distributed Lags

Code
rmv.q <- c(1) # y-vars
rmv.p <- list(spread = c()) # x-vars
rmv <- list(p = rmv.p, q = rmv.q)
r1 <- ardlDlm(formula = gdp_pc ~ spread,
              data = as.data.frame(tmp),
              p = 2,
              q = 1,
              remove = rmv)
summary(r1)

Time series regression with "ts" data:
Start = 3, End = 212

Call:
dynlm(formula = as.formula(model.text), data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4535 -0.5826 -0.0839  0.4303  4.2882 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.79552    0.10671  16.827  < 2e-16 ***
spread.t    -0.23343    0.08199  -2.847  0.00486 ** 
spread.1    -0.08588    0.10367  -0.828  0.40844    
spread.2     0.21504    0.08209   2.620  0.00946 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9274 on 206 degrees of freedom
Multiple R-squared:  0.08314,   Adjusted R-squared:  0.06978 
F-statistic: 6.226 on 3 and 206 DF,  p-value: 0.0004562
Code
rmv.q <- c() # y-vars
rmv.p <- list(spread = c(1)) # x-vars
rmv <- list(p = rmv.p, q = rmv.q)
r2 <- ardlDlm(formula = gdp_pc ~ spread,
              data = as.data.frame(tmp),
              p = 1,
              q = 2,
              remove = rmv)
summary(r2)

Time series regression with "ts" data:
Start = 3, End = 212

Call:
dynlm(formula = as.formula(model.text), data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9672 -0.4680 -0.0485  0.3921  4.0502 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.75420    0.17331   4.352 2.12e-05 ***
spread.t    -0.04155    0.04794  -0.867 0.387132    
gdp_pc.1     0.31456    0.06766   4.649 5.96e-06 ***
gdp_pc.2     0.26149    0.06862   3.810 0.000183 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8328 on 206 degrees of freedom
Multiple R-squared:  0.2607,    Adjusted R-squared:  0.2499 
F-statistic: 24.21 on 3 and 206 DF,  p-value: 1.835e-13
Code
rmv.q <- c() # y-vars
rmv.p <- list(spread = c()) # x-vars
rmv <- list(p = rmv.p, q = rmv.q)
r3 <- ardlDlm(formula = gdp_pc ~ spread,
              data = as.data.frame(tmp),
              p = 2,
              q = 2,
              remove = rmv)
summary(r3)

Time series regression with "ts" data:
Start = 3, End = 212

Call:
dynlm(formula = as.formula(model.text), data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.9803 -0.5649 -0.0341  0.4066  4.0575 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.67306    0.17356   3.878 0.000142 ***
spread.t    -0.12124    0.07392  -1.640 0.102518    
spread.1    -0.05366    0.09178  -0.585 0.559402    
spread.2     0.19217    0.07283   2.639 0.008967 ** 
gdp_pc.1     0.29954    0.06685   4.481 1.24e-05 ***
gdp_pc.2     0.27183    0.06763   4.019 8.21e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8196 on 204 degrees of freedom
Multiple R-squared:  0.2908,    Adjusted R-squared:  0.2735 
F-statistic: 16.73 on 5 and 204 DF,  p-value: 7.663e-14
Code
rmv.q <- c(1) # y-vars
rmv.p <- list(spread = c(0,1)) # x-vars
rmv <- list(p = rmv.p, q = rmv.q)
r4 <- ardlDlm(formula = gdp_pc ~ spread,
              data = as.data.frame(tmp),
              p = 2,
              q = 2,
              remove = rmv)
summary(r4)

Time series regression with "ts" data:
Start = 3, End = 212

Call:
dynlm(formula = as.formula(model.text), data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.3836 -0.5389 -0.0403  0.4194  3.9753 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.80640    0.15393   5.239 3.97e-07 ***
spread.2     0.07091    0.04848   1.463    0.145    
gdp_pc.2     0.43832    0.06443   6.803 1.08e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8735 on 207 degrees of freedom
Multiple R-squared:  0.1827,    Adjusted R-squared:  0.1748 
F-statistic: 23.14 on 2 and 207 DF,  p-value: 8.521e-10

A Note: Granger Causality

Granger Causality is when lags of \(X\) predict future values of \(Y\).

Granger Causality \(\neq\) (Mixtape) Causality!

  • Term spread does not cause GDP to decrease.
  • Term spread today is correlated with future changes in GDP.
    • It is more likely that investor sentiment about future GDP causes term spread to change.
    • GDP causes Spread
    • Spread “Granger causes” GDP

A Note: Granger Causality

Code
lmtest::grangertest(gdp_pc ~ spread, tmp, order = 1)
lmtest::grangertest(gdp_pc ~ spread, tmp, order = 2)
Granger causality test

Model 1: gdp_pc ~ Lags(gdp_pc, 1:1) + Lags(spread, 1:1)
Model 2: gdp_pc ~ Lags(gdp_pc, 1:1)
  Res.Df Df      F Pr(>F)
1    208                 
2    209 -1 0.3653 0.5462
Granger causality test

Model 1: gdp_pc ~ Lags(gdp_pc, 1:2) + Lags(spread, 1:2)
Model 2: gdp_pc ~ Lags(gdp_pc, 1:2)
  Res.Df Df      F  Pr(>F)  
1    205                    
2    207 -2 3.3511 0.03698 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Forecasting with X

Code
forecast(model,
         x,
         h = 1,
         interval = FALSE,
         level = 0.95)
  • model: a fitted ardlDlm object
  • x: values for \(X_{t+1}, ..., X_{t+h}\)
  • h: similar to n.ahead

Forecasting with X

We are interested in forecasting GDP. Let’s write some of the equations we believe:

\[\begin{align}\%\Delta \text{GDP}_t &= \alpha + \sum_{i = 1}^{2}\beta_i \%\Delta \text{GDP}_{t-i} + \gamma\text{S}_{t-2} + e_t \\ S_t &\sim ARIMA(p,d, q)\end{align}\]

We can fit both of these using auto.arima and ardlDlm. We will forecast 2013 and 2014.

Forecasting with X

Step 1: Generate a model for \(\%\Delta \text{GDP}_t\).

  • Use ardlDlm() to fit estimate parameters.

  • The structure of the model is determined from previous results.

Code
c_namez <- colnames(tmp)
tmp <- ts.union(tmp, h = ifelse(time(tmp) >= 2013, 1, 0))
colnames(tmp) <- c(c_namez, "h")
lim <- tmp[,"h"] == 0

rmv.q <- c() # y-vars
rmv.p <- list(spread = c(0,1)) # x-vars
rmv <- list(p = rmv.p, q = rmv.q)
main <- ardlDlm(formula = gdp_pc ~ spread,
              data = as.data.frame(tmp[lim,]),
              p = 2,
              q = 2,
              remove = rmv)
summary(main)

Time series regression with "ts" data:
Start = 3, End = 204

Call:
dynlm(formula = as.formula(model.text), data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0118 -0.5101 -0.0496  0.3926  3.9909 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.48808    0.16204   3.012  0.00293 ** 
spread.2     0.08388    0.04648   1.805  0.07264 .  
gdp_pc.1     0.33251    0.06776   4.907 1.92e-06 ***
gdp_pc.2     0.29455    0.06887   4.277 2.95e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8306 on 198 degrees of freedom
Multiple R-squared:  0.2725,    Adjusted R-squared:  0.2615 
F-statistic: 24.72 on 3 and 198 DF,  p-value: 1.254e-13

Forecasting with X

Step 2: Generate a model for \(S_t\).

  • Use auto.arima() for simplicity.

  • Forecasts from this model will be used in future steps.

Code
sub <- auto.arima(tmp[lim,"spread"])
summary(sub)
Series: tmp[lim, "spread"] 
ARIMA(1,1,2) 

Coefficients:
          ar1     ma1      ma2
      -0.5572  0.4556  -0.3120
s.e.   0.1610  0.1575   0.0695

sigma^2 = 0.6389:  log likelihood = -241.21
AIC=490.42   AICc=490.63   BIC=503.68

Training set error measures:
                      ME      RMSE       MAE      MPE     MAPE     MASE
Training set 0.003708918 0.7914635 0.5326302 4.411761 69.15104 1.014201
                    ACF1
Training set -0.00432909

Forecasting with X

Code
p <- predict(sub, n.ahead = 8)
plot(tmp[,"spread"], lwd = 1,
     ylim = c(-1, 4),
     xlim = c(2000, 2015),
     xlab = "Quarter", ylab = "Percentage Points",
     main = "Term Spread (10Y - 3M)")
lines(seq(2013, 2014.75, by = 1/4),
      tmp[!lim,"spread"], col = "dodgerblue", lwd = 3)
lines(seq(2013, 2014.75, by = 1/4),
      p$pred, col = "tomato", lwd = 3)
legend("bottomright", bty = "n", lty = 1,
       col = c("tomato", "dodgerblue"),
       legend = c("Forecast", "Actual"))

Forecasting with X

Code
r1 <- auto.arima(tmp[lim,"gdp_pc"])
p1 <- predict(r1, n.ahead = 8)

forecast(model = main,
         x = p$pred,
         h = 8,
         interval = FALSE) -> f
plot(tmp[,"gdp_pc"],
     ylim = c(-2, 3),
     xlim = c(2000, 2015),
     xlab = "Quarter", ylab = "Percentage Points",
     main = "Percent Change in GDP")
lines(seq(2013, 2014.75, by = 1/4),
      tmp[!lim,"gdp_pc"], col = "dodgerblue", lwd = 3)
lines(seq(2013, 2014.75, by = 1/4),
      f$forecasts, col = "tomato", lwd = 3)
lines(seq(2013, 2014.75, by = 1/4), lwd = 3,
      as.numeric(p1$pred), col = "mediumseagreen")
legend("bottomright", bty = "n", lty = 1,
       col = c("tomato", "mediumseagreen", "dodgerblue"),
       legend = c("ARDL Forecast", "ARIMA Forecast" , "Actual"))

Forecasting with X

Code
c_namez <- colnames(tmp)
tmp <- ts.intersect(tmp, spread_l2 = lag(tmp[,"spread"], -2))
colnames(tmp) <- c(c_namez, "spread_l2")

r1 <- auto.arima(tmp[,"gdp_pc"])
r2 <- auto.arima(tmp[,"gdp_pc"], xreg = tmp[,"spread_l2"])

Forecasting with X

Code
r1
Series: tmp[, "gdp_pc"] 
ARIMA(0,1,3)(2,0,1)[4] 

Coefficients:
          ma1     ma2      ma3    sar1     sar2     sma1
      -0.7283  0.0028  -0.1612  0.3949  -0.1987  -0.3020
s.e.   0.0688  0.0891   0.0660  0.2215   0.0707   0.2222

sigma^2 = 0.6461:  log likelihood = -248.75
AIC=511.5   AICc=512.06   BIC=534.9
Code
r2
Series: tmp[, "gdp_pc"] 
Regression with ARIMA(0,1,2)(2,0,2)[4] errors 

Coefficients:
          ma1      ma2    sar1     sar2     sma1    sma2    drift    xreg
      -0.7207  -0.1499  0.3426  -0.5230  -0.2175  0.3599  -0.0054  0.1829
s.e.   0.0639   0.0711  0.3713   0.2233   0.4182  0.2599   0.0071  0.0589

sigma^2 = 0.6321:  log likelihood = -245.38
AIC=508.76   AICc=509.67   BIC=538.84

Cointegration

Cointegration

oldladyRandomWalk.gif (14623 bytes)     

boy_randomWalk.gif (976 bytes)

oldman_dog.gif (11472 bytes)

Cointegration

Suppose we want to model consumption as a function of income. If we assume consumption is some constant fraction of your income (according to economic theory), we could write the following model:

\[C_t = kI_t\]

We can rewrite this as:

\[ \begin{align}log(C_t) &= log(kY_t) \\ &= log(k) + log(Y_t)\end{align}\]

Cointegration

We can also say that:

\[\begin{align} \Delta log(C_t) &= log(C_t) - log(C_{t-1})\\ &= log(k) + log(Y_t) - (log(k) + log(Y_{t-1})) \\ &= log(k)-log(k) + log(Y_t) - log(Y_{t-1}) \\ &= \Delta log(Y_{t})\end{align}\]

Cointegration

Let’s suppose we estimated the following model (in logs):

\[C_t = \beta_0 + \beta_1 Y_t + \beta_2 Y_{t-1} + \alpha C_{t-1}\]

In the long run, we should find an equilibrium \(C^*\) and \(Y^*\):

\[C^* = \beta_0 + \beta_1 Y^* + \beta_2 Y^* + \alpha C^*\]

Simplifying:

\[(1-\alpha)C^* = \beta_0 + (\beta_1 + \beta_2) Y^*\]

Cointegration

\[(1-\alpha)C^* = \beta_0 + (\beta_1 + \beta_2) Y^*\]

Since \(\Delta log(C_t) = \Delta log(Y_t)\), then \(1-\alpha = \beta_1 + \beta_2\).

Let \(\gamma = 1-\alpha = \beta_1 + \beta_2\).

\[\begin{align}\alpha &= 1 - \gamma \\ \beta_2 &= \gamma - \beta_1\end{align}\]

Cointegration

\[C_t = \beta_0 + \beta_1 Y_t + \color{red}{\beta_2} Y_{t-1} + \color{red}{\alpha} C_{t-1}\]

\[ C_t = \beta_0 + \beta_1 Y_t + (\gamma - \beta_1) Y_{t-1} + (1-\gamma) C_{t-1} \]

\[C_t - C_{t-1} = \beta_0 + \beta_1 Y_t - \beta_1 Y_{t-1} + \gamma Y_{t-1} - \gamma C_{t-1}\]

\[C_t - C_{t-1} = \beta_0 + \beta_1 (Y_t - Y_{t-1}) + \gamma (Y_{t-1} - C_{t-1})\]

\[\Delta C_t = f(\Delta Y_t, Y_{t-1} - C_{t-1})\]

Where:

  • \(\Delta Y_t\) is the short run relationship
  • \(Y_{t-1} - C_{t-1}\) is the long run equilibrium

Cointegration

Idea: Two or more non-stationary processes may be linked through an equilibrium.

Consumption & Income

  • Overspending: liquidity constraint
  • Underspending: not optimizing
  • Utility Maximization

Cointegration

Code
pi <- read.csv("../data/inc_pce.csv")
pi$DATE <- ymd(pi$DATE)
par(mfrow = c(1, 2))
plot(pi$DATE, pi$PI, type = "l",
     ylim = range(pi$PI, pi$PCE),
     xlab = "Date", ylab = "log($)",
     col = "tomato")
lines(pi$DATE, pi$PCE, type = "l", col = "dodgerblue")
legend("topleft", legend = c("PI", "PCE"),
       col = c("tomato", "dodgerblue"),
       lty = 1, bty = "n")

plot(pi$DATE, log(pi$PI), type = "l",
     ylim = log(range(pi$PI, pi$PCE)),
     xlab = "Date", ylab = "log($)",
     col = "tomato")
lines(pi$DATE, log(pi$PCE), type = "l", col = "dodgerblue")
legend("topleft", legend = c("PI", "PCE"),
       col = c("tomato", "dodgerblue"),
       lty = 1, bty = "n")

Cointegration

Code
reg <- lm(log(PCE) ~ log(PI), data = pi)
summary(reg)

Call:
lm(formula = log(PCE) ~ log(PI), data = pi)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.059112 -0.014470 -0.003781  0.016900  0.057097 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.3604532  0.0056156  -64.19   <2e-16 ***
log(PI)      1.0134213  0.0006801 1490.18   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.02196 on 730 degrees of freedom
Multiple R-squared:  0.9997,    Adjusted R-squared:  0.9997 
F-statistic: 2.221e+06 on 1 and 730 DF,  p-value: < 2.2e-16

Cointegration

Code
plot(pi$DATE,
     reg$residuals,
     xlab = "Time", ylab = "",
     main = expression(paste("Residuals from: log(PCE"[t], ") = ", alpha, " + ", beta, "log(PI"[t], ")")),
     type = "l")

Cointegration

Since the residuals are not \(I(0)\), consumption and income are not cointegrated.

Code
tseries::adf.test(reg$residuals)

    Augmented Dickey-Fuller Test

data:  reg$residuals
Dickey-Fuller = -2.9276, Lag order = 9, p-value = 0.1857
alternative hypothesis: stationary

Cointegration

Error Correction Model: the change of one series is explained in terms of:

  • the lag of the difference between the series
  • lags of the differences of each series

\[\begin{align}\Delta y_t &= \Sigma_{i = 1}^{p} \alpha_{i} \Delta y_{t-i} \\ &+ \Sigma_{j = 1}^{q} \beta_{j} \Delta x_{t-j} \\ &+ \gamma z_{t-1} + \mu + e_t\end{align}\]

Cointegration

Suppose income and consumption were in fact cointegrated. How would we estimate the model?

Code
pi_ts <- ts(pi, start = c(1959, 1), frequency = 12)
pi_ts <- ts.intersect(pi_ts,
                      pc = diff(log(pi_ts[,"PCE"])),
                      pi = diff(log(pi_ts[,"PI"])))
colnames(pi_ts) <- c("DATE", "PI", "PCE", "pc", "pi")

pi_ts <- ts.union(pi_ts, reg$residuals[2:length(reg$residuals)])
colnames(pi_ts) <- c("DATE", "PI", "PCE", "pc", "pi", "z")

Cointegration

Code
acf(pi_ts[,c("pc", "pi")])

Code
pacf(pi_ts[,c("pc", "pi")])

Cointegration

Code
# pi_ts <- ts.union(pi_ts, reg$residuals[2:length(reg$residuals)])
# colnames(pi_ts) <- c("DATE", "PI", "PCE", "pc", "pi", "z")

rmv.q <- c() # y-vars
rmv.p <- list(pi = c(), z = c(0, 2:12)) # x-vars
rmv <- list(p = rmv.p, q = rmv.q)
main <- ardlDlm(formula = pc ~ pi + z,
              data = as.data.frame(pi_ts),
              p = 12,
              q = 12,
              remove = rmv)
summary(main)

Time series regression with "ts" data:
Start = 13, End = 731

Call:
dynlm(formula = as.formula(model.text), data = data)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0218796 -0.0024900 -0.0000841  0.0027765  0.0236268 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.0024010  0.0005351   4.487 8.45e-06 ***
pi.t         0.2577396  0.0367299   7.017 5.41e-12 ***
pi.1         0.1427268  0.0388892   3.670 0.000261 ***
pi.2         0.1390537  0.0390799   3.558 0.000399 ***
pi.3         0.1208001  0.0394285   3.064 0.002270 ** 
pi.4         0.0336543  0.0396888   0.848 0.396757    
pi.5         0.0087431  0.0395938   0.221 0.825297    
pi.6        -0.0454194  0.0394691  -1.151 0.250229    
pi.7        -0.0369035  0.0394333  -0.936 0.349679    
pi.8        -0.0806267  0.0393097  -2.051 0.040636 *  
pi.9        -0.0231455  0.0392077  -0.590 0.555162    
pi.10       -0.0036435  0.0387130  -0.094 0.925045    
pi.11       -0.0053696  0.0379922  -0.141 0.887647    
pi.12       -0.0068108  0.0370442  -0.184 0.854181    
z.1         -0.0356259  0.0102078  -3.490 0.000514 ***
pc.1        -0.2174941  0.0384251  -5.660 2.22e-08 ***
pc.2        -0.0871550  0.0397149  -2.195 0.028530 *  
pc.3        -0.0487803  0.0401043  -1.216 0.224271    
pc.4        -0.0217107  0.0403061  -0.539 0.590305    
pc.5         0.0194078  0.0404392   0.480 0.631432    
pc.6         0.1180546  0.0404596   2.918 0.003639 ** 
pc.7         0.0959361  0.0407643   2.353 0.018880 *  
pc.8         0.0794330  0.0409547   1.940 0.052842 .  
pc.9         0.0548218  0.0408454   1.342 0.179978    
pc.10       -0.0098468  0.0405019  -0.243 0.807985    
pc.11        0.0510592  0.0398659   1.281 0.200702    
pc.12        0.0079616  0.0382834   0.208 0.835319    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.004938 on 692 degrees of freedom
Multiple R-squared:  0.2077,    Adjusted R-squared:  0.1779 
F-statistic: 6.976 on 26 and 692 DF,  p-value: < 2.2e-16

Cointegration

What does a negative coefficient mean?

The residual is positive when \(\text{consumption} > \alpha + \beta \times\text{income}\). Remember, these two things are \(\approx\) when in equilibrium.

In other words, if last period’s consumption is greater than the equilibrium amount, this period’s change in consumption will be less – snapping the relationship back towards equilibrium.

Next Classes

11/16:

  • VAR
  • Coefficient Dynamics

11/23:

  • No Class

11/30:

  • Forecast Competition
  • Project Help

12/7:

  • Final Presentations